123 research outputs found

    Application of amino acid occurrence for discriminating different folding types of globular proteins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Predicting the three-dimensional structure of a protein from its amino acid sequence is a long-standing goal in computational/molecular biology. The discrimination of different structural classes and folding types are intermediate steps in protein structure prediction.</p> <p>Results</p> <p>In this work, we have proposed a method based on linear discriminant analysis (LDA) for discriminating 30 different folding types of globular proteins using amino acid occurrence. Our method was tested with a non-redundant set of 1612 proteins and it discriminated them with the accuracy of 38%, which is comparable to or better than other methods in the literature. A web server has been developed for discriminating the folding type of a query protein from its amino acid sequence and it is available at http://granular.com/PROLDA/.</p> <p>Conclusion</p> <p>Amino acid occurrence has been successfully used to discriminate different folding types of globular proteins. The discrimination accuracy obtained with amino acid occurrence is better than that obtained with amino acid composition and/or amino acid properties. In addition, the method is very fast to obtain the results.</p

    Functional discrimination of membrane proteins using machine learning techniques

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Discriminating membrane proteins based on their functions is an important task in genome annotation. In this work, we have analyzed the characteristic features of amino acid residues in membrane proteins that perform major functions, such as channels/pores, electrochemical potential-driven transporters and primary active transporters.</p> <p>Results</p> <p>We observed that the residues Asp, Asn and Tyr are dominant in channels/pores whereas the composition of hydrophobic residues, Phe, Gly, Ile, Leu and Val is high in electrochemical potential-driven transporters. The composition of all the amino acids in primary active transporters lies in between other two classes of proteins. We have utilized different machine learning algorithms, such as, Bayes rule, Logistic function, Neural network, Support vector machine, Decision tree etc. for discriminating these classes of proteins. We observed that most of the algorithms have discriminated them with similar accuracy. The neural network method discriminated the channels/pores, electrochemical potential-driven transporters and active transporters with the 5-fold cross validation accuracy of 64% in a data set of 1718 membrane proteins. The application of amino acid occurrence improved the overall accuracy to 68%. In addition, we have discriminated transporters from other Ξ±-helical and Ξ²-barrel membrane proteins with the accuracy of 85% using k-nearest neighbor method. The classification of transporters and all other proteins (globular and membrane) showed the accuracy of 82%.</p> <p>Conclusion</p> <p>The performance of discrimination with amino acid occurrence is better than that with amino acid composition. We suggest that this method could be effectively used to discriminate transporters from all other globular and membrane proteins, and classify them into channels/pores, electrochemical and active transporters.</p

    Discrimination of outer membrane proteins with improved performance

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Outer membrane proteins (OMPs) perform diverse functional roles in Gram-negative bacteria. Identification of outer membrane proteins is an important task.</p> <p>Results</p> <p>This paper presents a method for distinguishing outer membrane proteins (OMPs) from non-OMPs (that is, globular proteins and inner membrane proteins (IMPs)). First, we calculated the average residue compositions of OMPs, globular proteins and IMPs separately using a training set. Then for each protein from the test set, its distances to the three groups were calculated based on residue composition using a weighted Euclidean distance (WED) approach. Proteins from the test set were classified into OMP versus non-OMP classes based on the least distance. The proposed method can distinguish between OMPs and non-OMPs with 91.0% accuracy and 0.639 Matthews correlation coefficient (MCC). We then improved the method by including homologous sequences into the calculation of residue composition and using a feature-selection method to select the single residue and di-peptides that were useful for OMP prediction. The final method achieves an accuracy of 96.8% with 0.859 MCC. In direct comparisons, the proposed method outperforms previously published methods.</p> <p>Conclusion</p> <p>The proposed method can identify OMPs with improved performance. It will be very helpful to the discovery of OMPs in a genome scale.</p

    Folding of small proteins: A matter of geometry?

    Full text link
    We review some of our recent results obtained within the scope of simple lattice models and Monte Carlo simulations that illustrate the role of native geometry in the folding kinetics of two state folders.Comment: To appear in Molecular Physic

    Defining an Essence of Structure Determining Residue Contacts in Proteins

    Get PDF
    The network of native non-covalent residue contacts determines the three-dimensional structure of a protein. However, not all contacts are of equal structural significance, and little knowledge exists about a minimal, yet sufficient, subset required to define the global features of a protein. Characterisation of this β€œstructural essence” has remained elusive so far: no algorithmic strategy has been devised to-date that could outperform a random selection in terms of 3D reconstruction accuracy (measured as the Ca RMSD). It is not only of theoretical interest (i.e., for design of advanced statistical potentials) to identify the number and nature of essential native contactsβ€”such a subset of spatial constraints is very useful in a number of novel experimental methods (like EPR) which rely heavily on constraint-based protein modelling. To derive accurate three-dimensional models from distance constraints, we implemented a reconstruction pipeline using distance geometry. We selected a test-set of 12 protein structures from the four major SCOP fold classes and performed our reconstruction analysis. As a reference set, series of random subsets (ranging from 10% to 90% of native contacts) are generated for each protein, and the reconstruction accuracy is computed for each subset. We have developed a rational strategy, termed β€œcone-peeling” that combines sequence features and network descriptors to select minimal subsets that outperform the reference sets. We present, for the first time, a rational strategy to derive a structural essence of residue contacts and provide an estimate of the size of this minimal subset. Our algorithm computes sparse subsets capable of determining the tertiary structure at approximately 4.8 Γ… Ca RMSD with as little as 8% of the native contacts (Ca-Ca and Cb-Cb). At the same time, a randomly chosen subset of native contacts needs about twice as many contacts to reach the same level of accuracy. This β€œstructural essence” opens new avenues in the fields of structure prediction, empirical potentials and docking

    From Isotropic to Anisotropic Side Chain Representations: Comparison of Three Models for Residue Contact Estimation

    Get PDF
    The criterion to determine residue contact is a fundamental problem in deriving knowledge-based mean-force potential energy calculations for protein structures. A frequently used criterion is to require the side chain center-to-center distance or the -to- atom distance to be within a pre-determined cutoff distance. However, the spatially anisotropic nature of the side chain determines that it is challenging to identify the contact pairs. This study compares three side chain contact models: the Atom Distance criteria (ADC) model, the Isotropic Sphere Side chain (ISS) model and the Anisotropic Ellipsoid Side chain (AES) model using 424 high resolution protein structures in the Protein Data Bank. The results indicate that the ADC model is the most accurate and ISS is the worst. The AES model eliminates about 95% of the incorrectly counted contact-pairs in the ISS model. Algorithm analysis shows that AES model is the most computational intensive while ADC model has moderate computational cost. We derived a dataset of the mis-estimated contact pairs by AES model. The most misjudged pairs are Arg-Glu, Arg-Asp and Arg-Tyr. Such a dataset can be useful for developing the improved AES model by incorporating the pair-specific information for the cutoff distance

    Ξ²Ξ±-Hairpin Clamps Brace Ξ²Ξ±Ξ² Modules and Can Make Substantive Contributions to the Stability of TIM Barrel Proteins

    Get PDF
    Non-local hydrogen bonding interactions between main chain amide hydrogen atoms and polar side chain acceptors that bracket consecutive Ξ²Ξ± or Ξ±Ξ² elements of secondary structure in Ξ±TS from E. coli, a TIM barrel protein, have previously been found to contribute 4–6 kcal molβˆ’1 to the stability of the native conformation. Experimental analysis of similar Ξ²Ξ±-hairpin clamps in a homologous pair of TIM barrel proteins of low sequence identity, IGPS from S. solfataricus and E. coli, reveals that this dramatic enhancement of stability is not unique to Ξ±TS. A survey of 71 TIM barrel proteins demonstrates a 4-fold symmetry for the placement of Ξ²Ξ±-hairpin clamps, bracing the fundamental Ξ²Ξ±Ξ² building block and defining its register in the (Ξ²Ξ±)8 motif. The preferred sequences and locations of Ξ²Ξ±-hairpin clamps will enhance structure prediction algorithms and provide a strategy for engineering stability in TIM barrel proteins
    • …
    corecore